An UCT Approach for Anytime Agent-Based Planning

نویسندگان

  • Damien Pellier
  • Bruno Bouzy
  • Marc Métivier
چکیده

In this paper, we introduce a new heuristic search algorithm based on mean values for anytime planning, called MHSP. It consists in associating the principles of UCT, a bandit-based algorithm which gave very good results in computer games, and especially in Computer Go, with heuristic search in order to obtain an anytime planner that provides partial plans before finding a solution plan, and furthermore finding an optimal plan. The algorithm is evaluated in different classical planning problems and compared to some major planning algorithms. Finally, our results highlight the capacity of MHSP to return partial plans which tend to an optimal plan over the time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UCT for Tactical Assault Planning in Real-Time Strategy Games

We consider the problem of tactical assault planning in real-time strategy games where a team of friendly agents must launch an assault on an enemy. This problem offers many challenges including a highly dynamic and uncertain environment, multiple agents, durative actions, numeric attributes, and different optimization objectives. While the dynamics of this problem are quite complex, it is ofte...

متن کامل

Trial-Based Heuristic Tree Search for Finite Horizon MDPs

Dynamic programming is a well-known approach for solving MDPs. In large state spaces, asynchronous versions like Real-Time Dynamic Programming (RTDP) have been applied successfully. If unfolded into equivalent trees, Monte-Carlo Tree Search algorithms are a valid alternative. UCT, the most popular representative, obtains good anytime behavior by guiding the search towards promising areas of the...

متن کامل

Trial-Based Heuristic Tree-search for Distributed Multi-Agent Planning

We present a novel search scheme for privacypreserving multi-agent planning. Inspired by UCT search, the scheme is based on growing an asynchronous search tree by running repeated trials through the tree. We describe key differences to classical multiagent forward search, discuss theoretical properties of the presented approach, and evaluate it based on benchmarks from the CoDMAP competition.

متن کامل

Online Planning for Ad Hoc Autonomous Agent Teams

We propose a novel online planning algorithm for ad hoc team settings—challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using t...

متن کامل

Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents

Planning agents often lack the computational resources needed to build full planning trees for their environments. Agent designers commonly overcome this finite-horizon approximation by applying an evaluation function at the leaf-states of the planning tree. Recent work has proposed an alternative approach for overcoming computational constraints on agent design: modify the reward function. In ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010